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The push for accountability in public schooling has extended to the measurement of teacher performance, 
accelerated by federal efforts through Race to the Top. Currently, a large number of states and districts across 
the country are computing measures of teacher performance based on the standardized test scores of their 
students and using them to help categorize teachers as effective or ineffective. 

The market for assessments coupled with derivative products that compute teacher effectiveness measures— 
each promoting a particular methodology— is becoming increasingly competitive. Simultaneously, a large 
number of research studies have come to light regarding strengths and limitations of different methodological 
choices. 

With the implementation of the Common Core State Standards in the majority of states and the development 
of specific sets of assessments that align with them, it is particularly urgent to open up the field right now to 
talk about the best way to compute teacher performance measures based on the evidence that has been 
compiled. 

On October 10-11, 2013, therefore, researchers Cassandra Guarino, Mark Reckase, and Jeffrey Wooldridge 
hosted an IES sponsored conference at Michigan State University on the timely topic of "Using Student Test 
Scores to Measure Teacher Performance: The State of the Art in Research and Practice." 

The conference brought together more than 80 researchers, policy-makers, and practitioners from across the 
U.S. to discuss how to promote best practices in constructing and implementing value-added and growth 
measures of teacher performance. 

The primary goal of the conference was to make a positive impact on policy at a critical point in time, by (1) 
raising awareness regarding the strengths and weaknesses of different methods of computing teacher 
performance measures and (2) discussing how to disseminate this information in a way that will enable school 
systems to make informed choices as they implement teacher evaluation systems. 

The conference specifically addressed pressing issues related to choosing and implementing particular 
methodologies. Panelists from the National Center for Assessment, American Institutes for Research, the 
North Carolina Department of Public Instruction, the Value-Added Research Center, the American Federation 
of Teachers, and the Council of Chief State School Officers provided many insights on implementation. In 
particular, they highlighted the following six issues: 

1) Shift in the focus of accountability to teachers: The focus on school accountability has shifted the theory 
of action over to classroom instruction, with the idea that great teachers are the primary instruments to 
close the achievement gap. 
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2) Credibility and communication are key: Stakeholders are looking for transparency and a language for 
talking about teacher performance models that they can understand. Moreover, there are many 
methodological and logistical choices to be made in constructing teacher performance measures, thus it is 
important to be able to explain the nuances in the system. 

3) Certain methodologies are more controversial than others: Value-added models are viewed as highly 
controversial, and it is preferable not to label methods using that terminology. 

4) Modeling choices are not always made on the basis of better models: States and districts may opt for 
models that are less capable of identifying teacher effectiveness than others because the language 
currently used to explain them is easier for stakeholders to understand. 

5) Decisions about which students and teachers to be included in the calculations are important and not 
simple: It is difficult to include highly mobile students, and, currently, fewer than half of teachers can be 
assessed with value-added or growth models due to the subjects and grades tested. 

6) Preparation is an important feature of successful implementation: Given the fast timeline states and 
districts have to implement teacher evaluation systems, it is advisable to set up the infrastructure in 
advance and do dry runs with simulated data. 


Presentations by prominent researchers from several universities and research institutes focused on six main 

topics and suggested the following: 

1) Model specification tests: Model specification tests can be misleading. Although the results of such tests, 
such as the "falsification" tests discussed in Rothstein (2010), have been cited as evidence to discredit 
value-added models, they in fact tell us very little about whether or not particular models produce good 
estimates of teacher effectiveness. A value-added model can fail these tests and yet do a good job of 
identifying teacher effects. 

2) Comparisons of growth and value-added models: The Colorado Growth Model works as well as any 
value-added specification in measuring a teacher's effectiveness under conditions in which students are 
randomly assigned to teachers. However, value-added models that explicitly include both prior test scores 
and teacher variables are better in the many cases in which students are tracked into classrooms on the 
basis of prior performance and assigned to teachers in a nonrandom fashion. 

3) Influence of tracking and instructional patterning: Assessing teacher value-added is more difficult in 
middle and high school than in elementary school due to the prevalence of tracking and course 
sequencing. Accounting for current and prior tracks is important in estimating teacher performance to 
avoid bias. Using End of Course (EOC) exams is an alternative way to incorporate student performance 
measures into teacher evaluation but there are issues to be dealt with regarding course timing and the fact 
that not all students take EOC exams for all courses. 

4) Inclusion of classroom measures in models: The inclusion of classroom measures in value-added models 
can substantially affect how teachers are classified. Moreover, it can be difficult to disentangle the teacher 
effect from the peer effect when students are non-randomly assigned to teachers. Measures based on 
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multiple classrooms and years of data can help alleviate this issue somewhat. 

5) Methods of linking teachers to students: Difficulties in properly linking teachers to students can stem 
from inaccuracies in class rosters and team teaching, especially at the elementary level. Although a few 
districts have undertaken efforts to confirm class rosters, many districts link large proportions of students 
to the wrong teachers. Roster-confirmed data and estimators that assign dosages to teachers can help 
account for these issues. 

6) Impact of measurement error: Measurement error in test scores is not properly addressed by commonly 
applied correction techniques that rely on classical error assumptions. However, simpler methods of 
accounting for this problem may be more useful. 


For more information about the conference, please go to: http://vam.educ.msu.edu/ The website contains 
links to videos of all conference panel discussions and presentations, the presentation slides, and papers to 
provide information to policy makers, researchers, and educators. 
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